feat: 2.5x faster model generation (and 2.5x lower model costs) #70

moinnadeem · 2023-07-18T04:23:46Z

Hello!

🔥 work on the companion app!

I've been hacking around on low-latency inference recently on Replicate. This PR swaps out the default Vicuna model for a model with ~2.5x faster generation (depending on the sequence length).

It's just a base Replicate model, so we can just swap out the model string: https://replicate.com/moinnadeem/fastervicuna_13b
We could also work on getting these things upstreamed to the main Replicate model too.

WIP:

Test the frontend
a 3x improvement on latency

Things to test:

Does streaming generation work? I am in progress on this one

jenniferli23 · 2023-07-19T17:13:56Z

Thanks for the PR @moinnadeem !!

Do you mind elaborate on how did you achieve the 2.5 faster and 2.5 lower cost model? And how did you benchmark it?

We're working on testing this out before merging.

Add fastervicuna

47e3aaa

moinnadeem changed the title ~~feat: 2.5x faster model generation (and 2.5x lower Replicate bill)~~ feat: 2.5x faster model generation (and 2.5x lower model costs) Jul 18, 2023

jenniferli23 mentioned this pull request Jul 25, 2023

How to speed up bots answers? #74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 2.5x faster model generation (and 2.5x lower model costs) #70

feat: 2.5x faster model generation (and 2.5x lower model costs) #70

moinnadeem commented Jul 18, 2023 •

edited

Loading

jenniferli23 commented Jul 19, 2023

feat: 2.5x faster model generation (and 2.5x lower model costs) #70

Are you sure you want to change the base?

feat: 2.5x faster model generation (and 2.5x lower model costs) #70

Conversation

moinnadeem commented Jul 18, 2023 • edited Loading

jenniferli23 commented Jul 19, 2023

moinnadeem commented Jul 18, 2023 •

edited

Loading